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ABSTRACT 


Web performance measurements and availability tests have 
been carried out using a variety of infrastructures over the 
last several years. Disruptions in the Internet can lead to 
Web sites being unavailable or increase user-perceived la- 
tency. The unavailability could be due to DNS, failures in 
segments of the physical network cutting off thousands of 
users, or attacks. Prompt reactions to network-wide events 
can be facilitated by local or remote measurement and mon- 
itoring. Better yet, a distributed set of intercommunicating 
measurement and monitoring entities that react to events 
dynamically could go a long way to handle disruptions. 

We have designed and built ATMEN, a triggered measure- 
ment infrastructure to communicate and coordinate across 
various administrative entities. ATMEN nodes can trigger 
new measurements, query ongoing passive measurements or 
historical measurements stored on remote nodes, and coordi- 
nate the responses to make local decisions. ATMEN reduces 
wasted measurements by judiciously reusing measurements 
along three axes: spatial, temporal, and application. 

We describe the use of ATMEN for key Web applications 
such as performance based ranking of popular Web sites 
and availability of DNS servers on which most Web transac- 
tions are dependent. The evaluation of ATMEN is done us- 
ing multiple network monitoring entities called Gigascopes 
installed across the USA, measurement data of a popular 
network application involving millions of users distributed 
across the Internet, and scores of clients to aid in gathering 
measurement information upon demand. Our results show 
that such a system can be built in a scalable fashion. 


Categories and Subject Descriptors 


C.2.3 [Communication Networks]: Network Operations— 
Network monitoring; C.2.4 [Communication Networks]: 
Distributed Systems— Distributed applications; 

C.4 [Performance of Systems]: Reliability, availability, 
and serviceability 


General Terms 


Design, Experimentation, Measurement, Performance 
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1. INTRODUCTION 


Numerous Web performance measurement infrastructures 
have been built over the years. Entities such as Keynote [15] 
routinely measure availability and performance of popular 
Web servers. Content Distribution companies such as Aka- 
mai perform internal measurements to balance load on their 
large set of servers. Measuring and monitoring infrastruc- 
tures perform the role of watching numerous servers typi- 
cally by periodically polling them. 

Beyond failures at the individual server sites, disruptions 
in the Internet can lead to sites being unavailable or increase 
user-perceived latency. Disruptions could be at any of the 
protocol layers directly involved in the Web transaction such 
as DNS or TCP. But network failure events can cause a 
large collection of clients to be cut off from accessing the 
site even though the Web site itself is functioning flawlessly. 
Constant measurement and monitoring of collections of Web 
servers is thus an important task to ensure performance and 
availability of Web sites. 

Current monitoring techniques tend to use distributed 
infrastructures that coordinate internally in a proprietary 
fashion. For example, ISPs measure and log information lo- 
cally and correlate on an intra-net basis. However, rarely is 
there correlation of such information with other monitoring 
sites. Correlation of measurement and monitoring informa- 
tion gathered elsewhere can aid in handling network events 
such as worms and flash crowds [14]. Geographically dis- 
tributed set of machines [26] and network beacons [19] are 
available as part of a testbed for such purposes. Hardware- 
based packet monitors [5] and associated software can pro- 
cess and analyze high volumes of data. Traditional mon- 
itoring devices inside a small administrative entity report 
all events to a centralized repository. The volume of traf- 
fic however prevents this approach from scaling to larger 
administrative entities or for even selectively sharing across 
the Internet between multiple administrative entities. Thus, 
there is a need for a distributed set of intercommunicating 
measurement entities that can dynamically react to events 
and gather application-specific measurement data. 

In this paper we report on the design and development of 
ATMEN—a triggered measurement infrastructure for com- 
municating and coordinating across various administrative 
entities. ATMEN nodes are capable of triggering new mea- 


surements on remote nodes and querying ongoing passive 
and existing historical measurements. Measurements can be 
selectively turned on and off for specific durations of time 
on a subset of co-operating set of measurement sites based 
on the occurrence of one or more events. Selective triggers 
tailored to local needs and written to exploit different ca- 
pabilities of multiple data sites is a more practical, low cost 
solution. Multiple queries can be run simultaneously in the 
presence of live traffic. The results of the queries are co- 
ordinated before making local decisions in conjunction with 
local information. Monitoring Web sites for availability and 
performance are just two exemplar applications of ATMEN. 

A key goal of ATMEN is to avoid wasted measurements 
by judiciously reusing measurements. The same parame- 
ter is often measured by multiple applications, and most 
measured parameters exhibit varying degrees of stationarity 
across time and space. Earlier proposals include techniques 
for reducing measurements in the form of a single measure- 
ment entity or addition of more primitives into the network 
to reduce the measurements that overlay services have to 
make [22],[12]. However, such limited forms of measure- 
ment reuse do not envision temporal and spatial stationarity 
or reuse across applications. Commonalities across clusters 
of measurement sites may allow the measurements to be 
reused across space. Many applications share certain com- 
mon primitive measurement components and these may be 
reusable by other applications. For example, many applica- 
tions on the Internet interested in performance may measure 
a DNS component which may be reusable across applications 
if there is not considerable variance in the timescale of in- 
terest. Reuse in ATMEN is thus along three axes: spatial, 
temporal, and application. 

We began with a preliminary measurement study to ex- 
amine spatial and temporal stationarity of the constituent 
elements of measurement, and the ability to partially reuse 
application-level measurements. The results convinced us of 
the potential viability of our approach. This paper describes 
the design, implementation, and evaluation of ATMEN. 

The main contributions of our work are: 


1. The design and construction of a triggered measure- 
ment infrastructure involving hardware and software 
components, historical collections of past measurements 
and a protocol for communication between the nodes. 
The basis for carrying out quick and low-cost moni- 
toring to aid a variety of application-level tasks is pro- 
vided by the infrastructure. 


2. Identifying the axes of possible reuse of network mea- 
surements to reduce wasted measurements. 


3. A demonstration of the feasibility of a system reusing 
measurements across time, location, and application. 


The demonstration, a key focus of this paper, is via two 
distinct experiments. The first experiment monitors down- 
load times of popular Websites from numerous PlanetLab 
client sites. Simultaneously, network logs of a popular in- 
teractive application are polled to look for problem areas. 
Using hints of network delays in certain segments of the 
Internet from the interactive application, additional down- 
load measurements are triggered in the Web site monitoring 
application. The second experiment monitors DNS server 
availability through multiple ATMEN nodes passively mea- 
suring DNS traffic. When the number of DNS responses is 


less than the number of requests at any of the nodes for any 
DNS name, a set of PlanetLab nodes are used to check the 
status of the DNS server for that DNS name. Our primary 
aim is to demonstrate the low overhead of ATMEN. 


2. RELATED WORK 


ATMEN is the first comprehensive application-independent 
architecture for triggered measurements and measurement 
reuse. Prior work focused on a particular application do- 
main, was limited to one type of measurement, or lacked a 
model to compose diverse measurements. Both [22] and 
[12] focus on reducing the amount of measurements overlay 
networks have to make. They do not consider the spatial 
and temporal stationarity of measurements and the reuse of 
measurements for anything other than overlay applications. 
Other application domains that have seen increased atten- 
tion lately are distributed Intrusion Detection [2], large scale 
worm detection [4], Internet Storm Center [11] for widespread 
attack detection, distributed attack suppression [25], and 
joint analysis of firewall logs [7]. The proposed solutions are 
application-specific and ignore spatial and temporal station- 
arity of measurements. Similarly services such as Keynote [15] 
and its affiliated services (Internet health report [10] and 
Netmechanic [24]), and Multicast beacons [19] conduct var- 
ious performance measurements using a distributed infra- 
structure but do not provide a system which would allow 
selective reuse or triggering of measurements by third par- 
ties. There also exist standalone systems, such as eValid [8], 
that provide Website capacity performance analysis. 

EtE monitor [3] measures Web site performance by pas- 
sive accumulation of packet traces from a Web server, re- 
constructing the accesses, and presenting both network and 
server overhead by a combination of statistical filtering meth- 
ods as heuristics. The monitor can be placed at a Web server 
or at any point in the network where it can examine traffic 
associated with a Web server. The authors do not discuss 
the possibility of monitors communicating with each other. 
ANEMOS [6] enables scheduling of measurements, and pro- 
vides tools for automation of collection and analysis of mea- 
surements. However, it requires a single administrative unit 
to have complete control over the system and does not en- 
able interaction of applications and measurement entities 
across administrative domains. 

A proposal for distributed measurements [28] in the peer- 
to-peer paradigm to measure properties of a network path 
between two participants does not address other measure- 
ment types such as HTTP download time nor does it present 
a way to reuse the gathered data. A similar end-system 
based approach is proposed by [13]. This system passively 
gathers various statistics from the end host’s local network 
and reports them to a central server. This system is also 
limited in the type of data it can gather and in addition we 
do not believe that such a fully centralized approach will 
scale. 

SPAND [27] is similar to ATMEN in that it reuses pas- 
sive measurements over space and time to improve replica 
selection. SPAND shows the benefits of sharing network 
measurements across time but does not share the general 
framework provided by ATMEN. Our survey of related work 
indicates that there are many non-interacting application 
specific solutions for distributed measurements. Therefore, 
we feel the time is ripe to combine these application specific 
solutions within a common framework. 
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Figure 1: Basic architecture of ATMEN 


3. DESIGN OF ATMEN 


Before implementing ATMEN, we identified the basic ar- 
chitecture of an application-independent triggered measure- 
ment infrastructure. There are primarily two kinds of enti- 
ties in ATMEN - applications (App) that require measure- 
ment data for some particular parameter and sources (DS) 
which provide measurement data for some parameter. Here, 
we employ the generic term parameter to refer to any in- 
dividual type of measurement data. Since ATMEN spans 
multiple administrative entities, it is not feasible to enforce 
every participating entity to construct Apps and DSes using 
some standard code base. A practical solution is to instead 
define the language, i.e., the protocol for communication 
between different entities in the system. All entities would 
only need to ensure that their Apps and DSes can speak and 
understand this protocol. To develop a protocol suitable for 
ATMEN, we consider the flow of control in any application 
that makes use of triggered measurements. 


repeat forever or upon a client request { 
Request data for a set of parameters P from DSes S 
if (event e is detected) { 
Request data for a set of parameters P’ from DSes S$’ 
} 


} 


ATMEN needs to provide the following three basic prim- 
itives to support such an application. 


1. Any DS should be able to advertise its existence and 
its measurement capabilities, i.e., the parameters for 
which it can provide measurement data and the lan- 
guage it supports for querying on these measurements. 


2. Any App should be able to discover the existence of 
the DSes which provide measurement data for its pa- 
rameter of interest. 


3. An App should be able to send a request for data to 
the relevant DS and the DS should be able to send 
back a response. 


Keeping these primitives in mind, we introduce a name 
server into the system to which DSes would send their adver- 
tisements and from which Apps would learn the existence of 
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DSes relevant to them. To support the three primitives, we 
propose the use of six types of messages in ATMEN’s com- 
munication protocol — POST, DEL, GET, GET-REPLY, 
REQ and RESP. The site hosting a DS advertises its ex- 
istence by sending a POST message to the name server, 
and can withdraw from the system by sending a DEL mes- 
sage to the name server. An App can learn about DSes 
providing measurement data of interest by sending an ap- 
propriate GET message to the name server and receiving 
a GET-REPLY message in response. The App then sends 
a REQ message to the site hosting the appropriate DS, re- 
questing measurement data. The DS sends data back to the 
site hosting the App, in a RESP message. The architecture 
of ATMEN is shown in Figure 1. 

A key objective in the design of ATMEN is to make it 
application and measurement data independent. We chose 
XML as the language in which the messages are specified. 
We also require that all communication be over TCP con- 
nections. We now formalize each of the six messages: 

POST: The POST message facilitates a DS to advertise 
itself to the rest of the system. To account for the fact that 
the site hosting a DS could go down or become inaccessi- 
ble, DSes are expired from the name server after a default 
timeout interval. This in turn requires DSes to send POST 
messages to the name server periodically to remain in the 
system. 

To enable Apps to request data from this DS, the POST 
message must contain the parameter param for which the DS 
supplies measurement data and the address ip_address:port 
to which requests for this data should be sent. How does 
an App comprehend the semantics of the name param for 
the concerned DS? We believe that all measurements can 
be expressed in terms of a globally understood namespace 
of commonly used parameters. To resolve this issue, we also 
include a string descr in the POST message which gives a 
precise description of how the DS measured the parameter 
param. For example, the description for DNS_lookup_time 
measured by some DS could be the time taken by a call to 
gethostbyname() within a program invoking it. 

In most cases, just requesting data for parameter param 
does not make much sense. For example, in the case of the 
parameter DNS lookup time, no value can be determined for 
this parameter unless the address prefix the measurement 
was made from, the DNS name for which the measurement 
was made and the time at which the measurement was made 
are specified. Hence, the POST message also includes the 
arguments that characterize any particular instance of the 
measurement. In the case of DNS lookup time, these argu- 
ments would be src_prefiz, DNS_name and timestamp. 

Apart from providing raw measurement values, a DS may 
support some language for querying on the measurement 
data. We could either enforce all DSes to support the same 
language for querying or include the format of the query 
language for the particular DS in the POST message. Both 
of these are too restrictive, and so we chose an approach in 
between these two extremes. We believe that the query lan- 
guage supported by most DSes would be decomposable into 
a standard set of capabilities, performing the basic functions 
of selection, transformation, and aggregation. To advertise 
these capabilities, the POST message would include a capa- 
bility vector for the capabilities supported by the DS. 

DEL: The timing out of POST messages from the name 
server ensures that DSes that are no longer active do not 


remain in the system. However, DSes might also wish to 
withdraw themselves from the system before this timeout 
period and the DEL message facilitates this. The name 
server needs to ensure that the entity that sends the DEL 
message for a particular DS is the same as the one that sent 
the POST message for that DS. 

GET: The GET message enables Apps to learn the exis- 
tence of DSes providing measurement data for their parame- 
ter of interest from the name server. GET should include the 
name of the parameter for which a list of DSes is required. 

GET-REPLY: In response to a GET message with pa- 
rameter param, the name server returns a GET-REPLY 
message that contains all the entries of the POST messages 
of all DSes providing data for the same parameter param. 

REQ: The REQ message is sent by an App to a partic- 
ular DS to request data from that DS. This message is sent 
over a TCP connection setup with the address:port speci- 
fied in the POST message for that DS. The REQ message 
specifies the name of the parameter param for which mea- 
surement data is solicited and a set of values for each of the 
arguments specified in the POST message for this DS. The 
REQ message should also specify the capabilities that need 
to be enforced along with appropriate values as arguments 
for these capabilities. 

RESP: The DS sends back its response in a RESP mes- 
sage with a list of tuples. Each tuple comprises of a value 
for the measurement parameter that data was requested for 
along with the values taken by the arguments for the par- 
ticular instance of the measurement that yielded this value. 
Since ATMEN is a cooperative best-effort system, the DS 
might choose not to return any data. The RESP message 
would then include an error code with the reason for not 
returning data. Such reasons include invalid values for the 
arguments, the requested data being unavailable, the server 
being overloaded, or refusal to provide data to a specific IP 
address based on some administrative policy. 

Each DS can enforce resource limits to limit the amount 
of data produced or the number of clock cycles consumed 
by any single query. In some special cases, the resources a 
query might consume can be determined prior to beginning 
its execution, but this is not possible in general. 

Figure 2 shows the exact format of the POST, REQ and 
RESP messages in the context of a particular App-DS inter- 
action. The DS measures DNS lookup times from the prefix 
123.156.0.0/16 for a set of DNS names www.foo.com, ..., 
www.bar.com, as specified by the src_prefix and dns_name 
arguments. The timestamp argument does not have any 
value indicating that it accepts requests for historical mea- 
surements made any time in the past. The DS uses the 
httperf [21] application for measuring DNS lookup times and 
does so by measuring the time taken by the gethostbyname () 
call within httperf. This DS supports two capabilities. The 
threshold capability implies that given a threshold value, the 
DS can supply values for all measurement instances with the 
required values for the arguments such that the value of the 
parameter is greater than the threshold. The count capabil- 
ity implies that given a set of values that the arguments can 
take, instead of providing the values of the measurement 
parameter, the DS can provide a count of the number of 
measurement instances with these values for the arguments. 
The App makes a request for instances of DNS lookup time 
measurements, made for one of the DNS names within some 
specific interval of time, that were greater than 3 seconds. 
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<POST> 
<SITE> 
<ADDRESS>123.156.101.202</ADDRESS> 
<PORT>12345</PORT> 
</SITE> 
<PARAM> DNS_lookup_time 
<ARG value="src_prefix">123.156.0.0/16</ARG> 
<ARG value="dns_name">www.foo.com, ....., www.bar.com</ARG> 
<ARG value="timestamp"></ARG> 
</PARAM> 
<DESCR=>Time taken by call to gethostbyname in httperf</DESCR> 
<CAP value="threshold"></CAP> 
<CAP value="count"></CAP> 
</POST> 

(a) 
<REQ> 
<PARAM>DNS_lookup_time 
<ARG value="src_prefix">123.156.0.0/16</ARG> 
<ARG value="dns_name">www.bar.com</ARG> 
<ARG value="timestamp">[1089210957, 1091038673]</ARG> 
</PARAM> 
<CAP value="threshold">3</CAP> 
</REQ> 

(b) 
<RESP> 
<TUPLE>4.037 
<ARG value="src_prefix">123.156.0.0/16</ARG> 
<ARG value="dns_name">www.bar.com</ARG> 
<ARG value="timestamp">1090078345</ARG> 
</TUPLE> 
</RESP> 

(c) 


Figure 2: (a) POST (b) REQ (c) RESP 


4. IMPLEMENTATION 


We implemented a realization of the ATMEN architecture 
described above. We could choose any possible implemen- 
tation for the name server, the Apps and the DSes as long 
as they communicate in conformance with the protocol de- 
scribed in the previous section. The two primary objectives 
that we had in mind while deciding on one particular imple- 
mentation were extensibility and minimizing communication 
overhead. The system should permit new Apps to make use 
of the infrastructure with ease and adding new DSes should 
be easy for any site. The second objective of reducing com- 
munication, useful in any system, is all the more significant 
in a triggered measurement setup as it is critical to be able 
to react to events quickly. 

The components in our implementation of ATMEN are 
shown in Figure 3. To simplify addition of new Apps as well 
as new DSes, we have interfaces on nodes which host Apps 
as well as on those which host DSes. We refer to these as 
the Application Measurement Interface (AMI) and the Data 
Source Measurement Interface (DSMI), respectively. When 
an App needs to retrieve some measurement data, it issues 
the request to the local AMI which sends a REQ message 
to the DSMI on the appropriate node. The DSMI retrieves 
the data from the appropriate DS, and sends back a RESP 
message to the AMI, which forwards it to the App. We next 
present details of our implementation of the name server, the 
AMI and the DSMI. 

The name server runs a TCP server at a globally known 
address:port. DSes advertise themselves by setting up a 
TCP connection with the name server and sending a POST 
message on the connection. Similarly, Apps set up a TCP 


Name 


Server 


Figure 3: Our implementation of ATMEN 


connection, send a GET message and receive a GET-REPLY 
message. A script to be executed by the administrator of 
the site hosting the DS, is used to send out an advertisement 
for any DS. Name server resiliency is vital as it cannot be a 
single point of failure of the whole system. It would have to 
be implemented in an hierarchical manner, much in the same 
way as the DNS root servers are currently implemented. 

The AMI is realized as a C library to minimize system 
overhead; Apps can link with it to access the exported API. 
The interface exported by the AMI essentially has two calls — 
getDSes and getMeasurement Value. The App can first learn 
about the existence of DSes it is interested in by making 
a getDSes call on the AMI. Based on the parameter param 
passed to it by the App, the AMI constructs an appropriate 
GET message and sends it to the name server. The AMI 
then parses the GET-REPLY message returned by the name 
server. The AMI returns to the App an array of an abstract 
datatype, called DStype, which is defined in the header file 
of the AMI that Apps would include. DStype captures all 
the properties a DS could specify in its POST message. 

The App then makes a getMeasurement Value call on the 
AMI to request measurement data. Based on the DStype 
object passed as argument, the AMI constructs a REQ mes- 
sage and sends it to the appropriate DS. To reduce the con- 
nection setup overhead, the AMI maintains persistent TCP 
connections, timing them out if unused for a long period of 
time. To enable the App to make multiple requests in paral- 
lel without having to use threads, the getMeasurement Value 
call on the AMI returns a file descriptor to the App as soon 
as the REQ message is sent out successfully. A separate 
thread within the AMI polls all the connections on which 
responses are expected. When a RESP message is received, 
the AMI identifies the file descriptor to which the tuples re- 
ceived should be written out. Since multiple requests and 
responses are sent and received over the same TCP con- 
nection, the AMI implements appropriate locking to ensure 
that there is at most one outstanding request to a given DS 
at a given time. 

On the site hosting the DS, the DSMI runs a TCP server 
at the port advertised in the POST message. When it re- 
ceives a REQ message, the DSMI parses the message and 
launches the appropriate command to obtain the required 
measurement data. The DSMI then reads in the output of 
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the command, constructs the corresponding RESP message 
and sends it back on the connection on which it received 
the REQ message. In comparison with the latency suffered 
by the REQ and RESP messages in traversing the Internet 
and the time taken to execute the process that provides the 
measurement data, the overhead due to the AMI and DSMI 
is extremely low, as we show in our results later. 


5. REUSE OF MEASUREMENTS 


Sharing of measurements in the context of a triggered net- 
work measurement infrastructure enables reuse across time, 
space, and application. This is based on the observation that 
many measurement parameters display some degree of tem- 
poral and spatial stationarity and many of the components 
measured are the same across applications. We present a 
rough outline of how this can be achieved to reduce the 
number of measurements that need to be made. 

Suppose an application needs the value for measurement 
of some parameter Z from a site s* at time t*. Z can be a 
function f(Xi,..., Xn) of some n component measurements 
X1,...,Xn. Several historical measurements might exist for 
each of these components, X;, as part of measurements of 
the same/different parameter by the same/different appli- 
cation. To reuse measurements, we need to characterize 
the variability of each component with respect to space and 
time with probability distributions Pý and Př, respectively. 
Pi (t*|t) is the probability of the measurement made at time 
t being still valid at time t*, while P?(s*|s) is the proba- 
bility of the measurement made at site s being valid at site 
s*. These probability distributions have to be determined a 
priori based on measurement studies of each component. 

Based on these probability distributions, we can com- 
pute the confidence value associated with each historical 
measurement and choose the value with which the highest 
value of confidence is associated. Assuming that histori- 
cal measurements exist for all the component measurements 
(X1, X2,...,Xn), a value of Z can be inferred and the con- 
fidence interval associated with this value can be computed. 
If the range of this interval is below a chosen threshold frac- 
tion of the inferred value of Z, then additional measurements 
are unnecessary; the inferred value for Z can be used. This 
threshold would be chosen based on the quality of measure- 
ment desired by the application or by the intrinsic variability 
associated with Z (the variation observed in measurements 
made at the same time at the same site). On the other 
hand, if the “quality” of the inferred value is deemed to be 
low, new measurements have to be triggered for a subset of 
the components (Xi,..., Xn). This subset would be deter- 
mined based on the costs associated with measuring each of 
these components and the increase in “quality” desired in 
the value for parameter Z. 

Such an approach towards measurement reuse would work 
for applications where the parameter of interest can be ex- 
pressed as a function of measurable components, the vari- 
ability of each of which can be characterized with respect to 
the time and location where the measurement was made. 

Though the above approach completely captures reuse of 
measurements across space and time within the same appli- 
cation, it can be applied for reuse across applications only 
when measurements can be directly reused. Full reuse of 
measurements across applications is applicable only when a 
component measured by an application Ai is also a compo- 
nent of the parameter of interest of another application A2. 


On the other hand, partial reuse of measurements across 
applications is possible where A; infers the need to trigger 
new measurements based on measurements made by A2. For 
such reuse A; would require a priori knowledge of the im- 
pact changes in the measurements made by A2 have on the 
components it measures. Thus, when A; measures a com- 
ponent X;, it would have to store the current estimates of 
measurements by Aə that influence X;. When A; reuses 
this measurement of component X; based on temporal and 
spatial stationarity, it has access to the new estimates for 
measurements made by Aə that affect X;. Based on the 
degree of change compared to the stored estimates, Ai can 
infer whether new measurements need to be triggered. 


6. EXPERIMENTS 


We deployed and evaluated two measurement applications 
that can take advantage of triggered measurements. Our ob- 
jective was not only to highlight the applicability of a trig- 
gered measurement infrastructure such as ATMEN, but also 
to choose applications which are in some way representative 
of applications that make use of Internet measurements. For 
this, we first classified all measurement applications based 
on their parameter of interest. We then implemented and 
deployed one application each from two of these categories. 

Monitoring DNS server availability: This is an ex- 
ample of applications that are interested in the success/failure 
of a measurement. We use passive network monitoring to 
detect the failure of a DNS lookup for a particular DNS 
name. If we detect such a failure we trigger active mea- 
surements from a set of geographically distributed sites to 
check if the DNS server for that particular DNS name is 
truly unreachable. This experiment allows us to monitor a 
large number of DNS names for availability with a minimal 
amount of active measurements. 

Performance-based ranking: This is an example of 
applications that are interested in the value/result of a mea- 
surement. We monitor the performance of a set of Web sites 
by ranking them in the order of the download times as mea- 
sured from a distributed set of sites. As download time can 
be broken down into DNS, TCP, and HTTP components, 
there is significant potential for measurement reuse. Tem- 
poral and spatial stationarity of these components can be 
taken advantage of to reduce the number of measurements 
required. To ensure that disruptions in the network are not 
missed, measurements made in other applications can be 
used to trigger new measurements. 

Applications in other categories where the parameter of 
interest is either some aggregate/summary statistic (such as 
total traffic per port or average link utilization), or the out- 
put/content of a measurement (such as IP address returned 
by DNS lookup or Web page returned by a HTTP transfer), 
can also benefit from ATMEN. 


6.1 Study of Measurement Reuse Potential 


We first studied the potential for measurement reuse in 
the performance-based ranking application. Download time 
is decomposed into sum of individual components [17, Chap- 
ter 15]. We obtained index.html from Web sites and fo- 
cused our attention on the main components of download 
time — DNS lookup time, TCP connection setup time, and 
HTTP transfer time. We first examine measurement reuse 
across the temporal and spatial axes and later show how 
measurements made by another application can trigger new 
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Figure 4: Temporal stationarity of (a) TCP setup 
time and (b) HTTP transfer time: High stability 
across 2 hours and reasonable stability across days. 


measurements showing application-level partial reuse of mea- 
surements. 

Our study involved periodic download of the home page of 
a set of popular Web sites from several PlanetLab nodes ge- 
ographically distributed around the globe chosen as client 
sites. We merged several popular Web site rankings — 
Netcraft [23], Mediametrix [20], and Fortune 500 [9] to ob- 
tain the 88 most popular Web sites. From each client site we 
downloaded the home page of these sites, roughly every 7.5 
minutes for 17 days between July 3-19 2004. For download- 
ing, we used httperf [21] and logged the DNS lookup time, 
TCP connection setup time, and HTTP transfer time. We 
divided the data into two parts—July 3rd to 11th and July 
12th to 19th. We analyzed the data in each part in isolation 
and the inferences that we drew based on the data in the 
first part were confirmed on the data in the second part. 


6.1.1 Reusability across time 


We analyzed the data collected to study the temporal and 
spatial stationarity of each of the components of download 
time. For temporal stationarity we looked at the variability 
of each component over various timescales. As a benchmark 
for variability, we considered the intrinsic variability of each 
component, i.e., by how much do two successive measure- 
ments of the same component vary. We computed the ratio 


Number of Web sites with fraction of 
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Index of PlanetLab node 
Figure 5: Bi-modal distribution of DNS lookup time 


of every pair of successive measurements and found the value 
below which more than 80% of these ratios lie. We deter- 
mined this value for each component for every (client, Web 
site) pair and refer to this as the intrinsic variability thresh- 
old for that component of the download time. We then com- 
puted the ratios of every pair of measurements 15 minutes, 
30 minutes, 1 hour, 2 hours, 6 hours, 12 hours, 1 day, 2 days 
and 4 days apart and computed the variability threshold 
over each of these timescales. Figure 4(a) shows, for each of 
these timescales, the number of Web sites on each Planet- 
Lab node for which the variability threshold of TCP connec- 
tion setup time at that timescale was less than the intrinsic 
variability threshold. On most PlanetLab nodes, the vari- 
ability of the TCP setup time even across days is the same 
as that observed in successive measurements. Thus, connec- 
tion setup time is pretty stable over multiple days. However, 
near-perfect stationarity across all PlanetLab nodes is ob- 
served only across a couple of hours. Figure 4(b) shows that 
HTTP transfer time also exhibits similar stability. 

DNS lookup time did not exhibit similar stability — dis- 
playing instead a bimodal distribution, with the two values 
representing DNS lookup performed with a cache hit and a 
cache miss. Figure 5 shows that on each PlanetLab node, 
the fraction of DNS times clustered around the 25th and 
75th percentiles (within either 100% or 10ms) is pretty high 
for almost all Web sites, but the number of Web sites with 
more than 60% of the measurements clustered around just 
the 25th percentile is significantly lesser. These cache misses 
were due to low cache hit rates in the nameservers. This was 
because the nameservers used on the PlanetLab nodes were 
different from those used by other machines at the same site. 
Since DNS time exhibits a bimodal distribution, we can use 
a randomized algorithm to predict which of the two modal 
values the next DNS lookup would take. 


6.1.2 Reusability across space 


To explore spatial stationarity, we studied whether each 
of the TCP connection setup and HTTP transfer time com- 
ponents from a particular Web site on a PlanetLab node can 
be predicted given the value of that component for the same 
Web site on another PlanetLab node close to itt. As TCP 
connection setup and HTTP transfer time were observed to 


1We do not expect DNS time to exhibit spatial stationarity. 
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Figure 6: Spatial stationarity of (a) TCP connection 
setup time and (b) HTTP transfer time: Both show 
strong correlation when client nodes are within 20 
ms of each other. 


be highly stable over time, we considered the median value 
to be representative of the distribution. On each PlanetLab 
node, we computed the median value for both components 
of download time to each Web site. We then computed for 
both components, the correlation co-efficient across the 88 
Web sites for each pair of PlanetLab nodes. Figure 6 shows 
that the correlation co-efficient is very close to 1 for almost 
all PlanetLab node pairs within 20ms of each other. 

Our study showed that the TCP setup time and HTTP 
transfer time components of download time are highly sta- 
ble over the period of a couple of hours, and that they are 
highly correlated for nearby sites. A similar study could 
be performed for the DNS availability application too. To 
examine temporal and spatial stationarity, DNS lookups to 
a set of Web sites need to be performed repeatedly over a 
period of time from a distributed set of sites. The results 
of these would then have to be analyzed to determine the 
probability that a DNS lookup fails at time t given that it 
failed at time t’, and the probability that a DNS lookup 
fails at site s given that it failed at site s’. We have not yet 
carried out a measurement study of such a nature. 


6.1.3 Reusability across applications 


To demonstrate reusability of measurements across appli- 
cations, we show how measurements made in an unrelated 
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Figure 7: (a) A sharp increase in number of clients 
lost is seen in all ASes indicating a problem with the 
server. (b) A sharp rise in number of clients lost is 
seen in a single AS indicating a problem in that AS 


application can be partially reused to trigger measurements 
in the performance-based ranking application. We consider 
a multi-user interactive application where tens of thousands 
of users from across the world connect to a single server 
cluster. This application generates a log file every minute 
containing the IP addresses of the set of clients currently 
connected to the servers. Our objective was to explore if 
anomalies observed in this application can be used to trig- 
ger new measurements in the ranking application. 

We compared every log file with that generated 3 minutes 
prior to it, and determined the instances where the number 
of users taking part in the application drops by more than 
10%. Our hypothesis was that a sharp drop in the number of 
users is more likely to be due to a network or server outage 
rather than users voluntarily leaving the application. We 
identified 8 such instances during the period from April to 
July 2004, the period for which we had data. We determined 
the set of clients that were lost during each of these 3 minute 
periods and then made use of BGP data to determine the 
AS-level paths from each of these clients to the server. Based 
on this, we computed for every AS, the number of clients 
that were lost whose paths pass through it. 

In one of the instances, a significant increase in the num- 
ber of lost clients was associated with every AS (Figure 
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7(a)). Since it is highly improbable that a catastrophic event 
occurred in all ASes ° simultaneously, this indicates that the 
problem was most probably with the server cluster. This in- 
formation is of no value for the performance-based ranking 
application. On the other hand, Figure 7(b) shows another 
instance where a significant increase in the number of lost 
clients is seen in only AS 1, clearly indicating that there has 
been some event in that particular AS. 

These observations indicate that the measurements made 
by the multi-user interactive application can be utilized to 
build an AS health index. Suppose such an index is available 
to the performance-based ranking application. To make use 
of this information, the ranking application needs to have 
a priori input that when the health of some AS is “low”, 
the TCP connection setup time and HTTP transfer time to 
any Web site, the path to which passes through that AS, 
could possibly have changed. To be able to make use of this 
information, whenever the ranking application makes an es- 
timate for the download time to a particular Web site, it 
also determines the AS-level path to the Web site. Suppose 
this estimate for download time is reused in the future based 
on temporal and spatial stationarity. If the health of one of 
the ASes along the path to the Web site has dropped signif- 
icantly, then the application infers that the TCP setup time 
and HTTP transfer time components of the historical mea- 
surement are suspect and a new measurement might have 
to be triggered. This is an instance of how partial reuse of 
measurements across applications is possible. 


6.2 Details of Application Deployment 
6.2.1 Data Sources 


We now briefly describe the measurement data sources 
that we utilized in the deployment of the two applications 
we chose - performance-based ranking of Web sites and mon- 
itoring of DNS server availability. 


e The Gigascope [5] packet monitoring system, which 
is a highly configurable high speed network monitor, 
was deployed at 3 locations within the US — 2 of them 
at a site in the South-West and the third at a loca- 
tion in the North-East. The Gigascope in the North- 
Eastern part, which we refer to as gsi henceforth, is 
monitoring a 45Mbps Internet access link connecting 
approximately 300 researchers to the Internet. The 
Gigascopes in South-Western US, which we refer to as 
gs2 and gs3 henceforth, are monitoring a cable access 
network used by a few thousand cable access customers 
with a typical traffic volume of 200Mbps. 


On these nodes, Gigascope passively monitors DNS 
requests and responses, binning these requests and re- 
sponses into time bins of 10 seconds. In each time 
bin, Gigascope generates a tuple of DNS request and 
response count seen for every DNS name for which a 
request or response was observed in that time bin. 


e We constructed an AS health index using the logs of 
the multi-user interactive application outlined above. 
These logs are generated by a Gigascope running at 
the server, but we built the index based on the logs 
stored elsewhere, to which the log files are transferred 
from the server with a few minutes delay. The basic 


2 AS numbers have been anonymized. 


intuition we employed in developing this index is the 
fact that the “health” of an AS is “bad” if a significant 
increase in the number of lost IPs was observed “only” 
in that particular AS. 


e 68 nodes in the PlanetLab testbed [26] were employed 
for performing active measurements supporting 3 dif- 
ferent measurements: 


Measuring download time from a set of Web 
sites. With the download times, the actual IP ad- 
dresses the downloads were performed from were also 
returned, to optionally perform traceroutes to the 
servers that responded. 


Performing DNS lookups for a set of DNS 
names. Along with the status of the lookup (suc- 
cess/failure), the TTL value in the responses were also 
returned to detect if any of the responses were from 
the cache. 


Performing AS-level traceroute to a set of IP 
addresses. We used the IP to AS mapping generated 
in [18] in combination with a tool for longest-prefix 
matching, built on the AT&T Data Stream Scan tool 
(dss), to return the AS-level paths to these addresses. 


Scripts in either Perl or Ruby were written to perform 
each of the above measurements. 


Note that this set of data sources includes querying from 
passive measurements and from historical measurements, 
and triggering of active measurements. In all of these cases, 
the DSMI had to execute the same sequence of actions. On 
receiving the REQ message, it parses the message and in- 
vokes the appropriate command or appropriate query. The 
DSMI then reads in its output, constructs the RESP mes- 
sage and sends it back to the node that issued the request. 


6.2.2 Applications 


We now outline the working of the application code that 
retrieved measurement data from the above sources and trig- 
gered new measurements when required. In the performance- 
based ranking application, once every 2 hours, the applica- 
tion polls a random subset of the 68 PlanetLab nodes, re- 
questing them to measure the download time from each of 
10 popular Web sites. The subset of PlanetLab nodes is 
chosen such that no two are within 20 ms latency of each 
other. As mentioned before, along with the download time, 
the PlanetLab node also returns the IP address the down- 
load was performed from. The application then requests the 
AS-level paths to these IP addresses and stores the AS-level 
path for every (PlanetLab node, Web site) pair. 

Once every 10 minutes, this application also polls the AS 
health index. When the application finds the health of some 
AS to have fallen below a threshold, it looks up the stored 
AS-level paths returned previously by the PlanetLab nodes 
and triggers new download time measurements from each 
PlanetLab node to the Web sites, the paths to which pass 
through the “low-health” AS. 

In the DNS availability monitoring application, the ap- 
plication initially subscribes to the 3 Gigascopes. It then 
continuously receives tuples, each of which contains a count 
of the number of DNS requests and responses observed in 
that time bin for some DNS name. When the application 
observes that for some DNS name, the number of DNS re- 
quests and DNS responses do not match up over consecutive 
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Figure 8: Distribution of delays as observed by the 
application and by the DSMI for making (a) down- 
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Figure 9: Overhead due to system for measurements 
made in the performance-based ranking application 


time bins’, it polls a random one-third of the 68 PlanetLab 
nodes to perform a DNS lookup for that DNS name. 


7. RESULTS 


We now present a summary of the results obtained from 
our deployment of the two applications — performance-based 
ranking and DNS availability monitoring. These results 
were collected based on the deployment of both applications 
over a period of 7 days between November 3-9, 2004. Our 
primary objectives in the evaluation of either application 
were to quantify the savings due to our triggered measure- 
ment setup and to demonstrate the low overhead of our sys- 
tem. When an App requests measurement data from a DS, 
the minimum delay it will experience is the RTT between 
the two sites added to the time taken by the DS to either 
perform an active measurement or query from ongoing pas- 
sive or stored historical measurements. We study both the 
applications we deployed to determine what overhead our 
system adds over and above this minimum delay. 


3Note that irrespective of size of the time bin, the response 


for a request need not necessarily be in the same time bin. 


7.1 Performance-based ranking 


Savings in this application were implicit because the setup 
drew upon the results of our study of the temporal and spa- 
tial stationarity of download time. Due to temporal station- 
arity, we measured the download time from each of the 10 
Web sites only once every 2 hours and due to spatial station- 
arity, these measurements were made only from around 20 of 
the 68 PlanetLab nodes whenever a measurement needed to 
be made. Based on our prior experience with the AS health 
index, we used the threshold to identify a “low-health” AS 
to be 0.9. During the 7-day period, there was not a sin- 
gle instance when the health of an AS fell below 0.9 and 
so, no download time measurements had to be triggered in 
addition to those made once every 2 hours. 

To quantify the overhead of ATMEN, we measured the 
delay in obtaining measurement data as seen from within the 
application and as seen from within the DSMI. The delay 
experienced by the application is the time between when 
the application makes a getMeasurementValue call on the 
AMI to when the application detects that it can read in 
data on the file descriptor returned to it by the AMI. The 
delay seen from within the DSMI is the time of execution of 
the script /command it launches to obtain the measurement 
data. The difference between these two delays minus the 
RTT between the sites hosting the application and the data 
source is the overhead of the system. 

In the performance-based ranking application, there are 
three kinds of data sources—download time, AS traceroute 
and AS health. The machine on which we ran the AS health 
source is in the same subnet as the one on which we ran the 
application. Hence, we only consider the overhead in ob- 
taining measurement data from the download time and AS 
traceroute sources which are hosted on the PlanetLab nodes. 
Figure 8 shows the distribution of the delay as seen by the 
application and as seen by the DSMI, both for download 
time and AS traceroute measurements. Both the distribu- 
tions are almost identical for either measurement parameter, 
which indicates that the overhead due to the system is ex- 
tremely low. We further confirm this in Figure 9 which plots 
the distribution for either kind of measurement the differ- 
ence between the two delays minus the RTT between the 
node hosting the application and the PlanetLab node that 
made the measurement. The range of values taken by the 
overhead, in comparison with the range of values taken by 
the overall delay, show that the overhead due to the system 
constitutes only a minute fraction of the overall delay. 


7.2 DNS availability monitoring 


Over the 7-day deployment period, the DNS availability 
application spanned 59728 time bins. During this period, 
822383 tuples were generated by gsl and 3218237 tuples 
were generated by gs2 and gs3. Based on this data, the 
application triggered 86318 and 557796 DNS lookups, re- 
spectively. Due to our setup of triggering active measure- 
ments based on measurements made passively, we obtain 
more than 80% savings in the number of DNS lookups that 
need to be performed. 

As in the performance-based ranking application, we plot 
the distribution of the delays in obtaining measurement data 
as seen by the application and as seen by the DSMI. Fig- 
ure 10 shows these distributions for DNS lookup measure- 
ments that were triggered based on data from gs1, and from 
gs2 and gs3, respectively. The range of delays observed in 
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ments made in the DNS availability application 


the latter case is greater because there are a greater num- 
ber of lookups triggered per request due to the higher traffic 
volume going through gs2 and gs3 per time bin. As in the 
performance-based ranking application, these graphs also 
show that the distribution of delays as seen by the appli- 
cation and as seen by the DSMI are almost identical. We 
again computed the overhead due to the system by subtract- 
ing the RTT between the machine hosting the application 
and the PlanetLab node on which the DNS lookup was made 
from the difference between the delays seen by the applica- 
tion and the DSMI. Figure 11 plots the distribution of this 
overhead. In this application too, we see that the overhead 
constitutes only a minor fraction of the overall delay. 


8. CONCLUSIONS AND FUTURE WORK 


We designed and implemented ATMEN—a triggered net- 
work measurement infrastructure. We used it to construct 
two Web applications — ranking of Web sites according to 
download time and monitoring of DNS server availability. 
Our study of temporal and spatial stationarity of the compo- 
nents of download time showed the significant potential for 
measurement reuse in the performance-based ranking appli- 
cation. We also demonstrated partial reuse of measurements 


across completely unrelated applications. Results from our 
deployment of both applications over a 7-day period showed 
the significant savings in a triggered measurement setup. 
The low overhead observed due to our system demonstrated 
its scalability. 

In subsequent work [16], we explored the potential for 
reusing measurements by applying a technique for detecting 
changes in multi-dimensional streams (based on the Kullback- 
Liebler distance metric). Using data we gathered during our 
measurement study, we showed that domain-independent 
techniques are useful in analyzing the stationarity of mea- 
surements. 

We are improving the AMI to further reduce the com- 
munication overhead. The AMI could cache results received 
from the name server and from different DSes and store them 
in a standard configuration file to share across Apps. The 
AMI can take advantage of the potential for measurement 
reuse transparent to the App. When the App requests data 
for some measurement parameter, the AMI can fetch the 
data for component measurements that the parameter can 
be broken down into, compose the values obtained for these 
components and hand the value for the requested parame- 
ter back to the App. We are also exploring the complexity 
involved in supporting wildcards as part of the predicates 
specified in the REQ message. 

Many security techniques have been developed to protect 
the privacy and integrity of TCP connections [29] as well as 
to authenticate and authorize [1] endpoints. We are looking 
into how these techniques can be applied to secure ATMEN. 
In other future work, we are examining a range of applica- 
tions that could benefit from ATMEN including providing 
early warning for attacks when seen in some part of the 
network. 
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